272 PART 5 Looking for Relationships with Correlation and Regression

as time goes on, you may want to perform a regression analysis to see whether the

upward trend is statistically significant (meaning not due to natural random fluc-

tuations). If it is, you may want to create an estimate of the annual rate of increase,

including a standard error (SE) and confidence interval (CI).

Some analysts use ordinary least-squares regression as described in Chapter 16 on

such data, but event counts don’t really meet the least-squares assumptions, so

the approach is not technically correct. Event counts aren’t well-approximated as

continuous, normally-distributed data unless the counts are very large. Also, their

variability is neither constant nor proportional to the counts themselves. So

straight-line or multiple least-squares regression is not the best choice for event

count data.

Because independent random events like highway accidents should follow a Pois-

son distribution (see Chapter 24), they should be analyzed by a kind of regression

designed for Poisson outcomes. And — surprise, surprise — this type of special-

ized regression is called Poisson regression.

Introducing the generalized linear model

Most statistical software packages don’t offer a command or function explicitly

called Poisson regression. Instead, they offer a more general regression technique

called the generalized linear model (GLM).

Don’t confuse the generalized linear model with the very similarly named general

linear model. It’s unfortunate that these two names are almost identical, because

they describe two very different things. Now, the general linear model is usually

abbreviated LM, and the generalized linear model is abbreviated GLM, so we will use

those abbreviations. (However, some old textbooks from the 1970s may use GLM to

mean LM, because the generalized linear model had not been invented yet.)

GLM is similar to LM in that the predictor variables usually appear in the model as

the familiar linear combination:

c

c x

c x

c x

0

1

1

2

2

3

3

. . .

where the x’s are the predictor variables, and the c’s are the regression coefficients

(with c0 being called a constant term, or intercept).

But GLM extends the capabilities of LM in two important ways:»

» With LM, the outcome is assumed to be a continuous, normally distributed

variable. But with GLM, the outcome can be continuous or an integer. It can